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1. INTRODUCTION 

One of the most well-known methods for diagnosis of cardiovascular diseases and their treatment is 
the electrocardiogram (ECG) signals [1]. Meanwhile, cardiac arrhythmia is known as the most common heart 
disorder. If patients’ vital symptoms are reported as timely as possible, specialists can diagnose the main 
weakness of their cardiovascular system and prevent severe cardiac actions. As a result, they can choose the 
most suitable treatment method [2]. Classification of heartbeats based on the signs received from the ECG 
signals plays a vital role in detection of acute cardiovascular disorders [3]. In this regard, considering the 
standards of human rights and preserving the privacy of personal data, the challenge of accessing this data is 
quite difficult. Our studies have led to the understanding of federated learning (FL) in order to access more 
data for diagnosis and improve the accuracy of artificial intelligence models. FL is a novel approach dedicated 
to the association for the advancement of artificial intelligence (AAAI) by Kairouz et al. [4] in 2016. This 
approach can involve data from multiple clients from different centers in the training process and follows 
distributed training [5]. This nascent concept allows several distributed devices to jointly train artificial 
intelligence models while complying the data privacy principle [6]. 

We present a new approach to the correct diagnosis of arrhythmias and the best recognition of cardiac 
disorders in order to detect accurate arrhythmias from the ECG signals received from multiple centers. One of 
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the advantages of the FL model over the centralized learning is the training of neural network models without 
the need for collecting data from a centralized location [5]. Our goal in the FL is to reduce the global loss 
function during the running process. The FL model allows multiple clients to participate in the learning process. 
Clients enable a global model to perform the process of aggregating parameters from the clients to the server 
and going through the communication cycle only by sending the training parameters without sharing even a 
single unit of the training action information [7]. But what is unacceptable for us is the sharp drop in accuracy 
in the FL due to the imbalanced data received from heterogeneous clients. This imbalance can due to the label 
NON_IID or the data volume [8]. 

The FedAvg algorithm is known as the aggregation method in the server [9]. This algorithm 
aggregates the weights from the clients and transfers them to the server. A suitable value for the proportion of 
data in the clients with data volumes close to each other (e.g., 35% of the data for the first client, 40% for the 
second, and 25% for the third) indicates a balanced composition of the data in different clients. In this case, the 
training with progress naturally and will not cause a drop in attack accuracy from the client side. A maximum 
disparity in the volumes of data in different clients—in proportions of 90% for the first client, 7% for the second, 
and only 3% of the total amount of data for the third one—indicates a high imbalance in the available data. This 
imbalance causes a severe drop in the accuracy of the aggregation of the weights after training with the 
available data. 

In this study, any drop in the model accuracy in group detection is prevented by providing a suitable 
aggregation method. If it is supposed that the distribution of data in clients follows a multivariate Gaussian 
distribution, we transfer the values resulted from the execution to the server in the form of averages and 
covariance instead of transferring the average and variance of each model. This transfer strategy can greatly 
reduce the disproportion caused by imbalanced data in training and end up with a suitable aggregation. Of 
course, we suppose that the sampling technique will also help us in this direction. By excluding a few pieces 
of this huge possibility, we create a dead end in order to avoid reverse engineering in the data and keep it from 
being stolen by hackers. We take the sampling with the scientific wild-ass guess (SWAG) technique as a 
weighted average of all the parameters. We call our method federated learning based stochastic averaging 
weights (FEDSWAG) and by reaching an accuracy of 98.59%, a favorable accuracy compared to previous 
approaches is reported. The innovations of this study include: 
= Using the new FL approach to diagnose cardiac abnormalities; 

- Providing a strong aggregation approach to solve the imbalanced data problem; and 
- Applying a sampling technique for the first time in ECG dataset. 


2. RELATED LITERATURE 
2.1. Deep learning tasks for ECG signal classification 

Using a dataset in an unsupervised manner, the models are first trained, and then, it is tried to fit them 
correctly in a supervised way. While the main focus of this study is on the application of self-supervised 
learning for effective ECG learning, we investigate several other aspects in order to increase efficiency [10]. 
Several techniques, including knowledge-based properties and supervised pre-training, are employed in order 
to achieve the maximum accuracy and stability of heartbeat classification in the context of a weak supervision 
[11]. This study provides a classification to calculate specific properties, including PQ time, QTc, and Q-Q 
interval, within the network. In addition to the features derived during the calculations, the bottleneck layer in 
the U-Net network is proposed as an alternative for classification [12]. This study presents a fusion approach, 
associated with the DERMA dual event, as well as the fourier transform algorithm FrlFT in order to accurately 
identify normal and non-normal morphological properties in electrocardiogram signals [13]. Subasi and 
Ercelebi [14], the authors use a wavelet transform approach for classification in the artificial neural network 
(ANN) and logistic regression. They use the wavelet transform strategy in order to increase the speed of pre- 
processing calculations. Lotte et al. [15] evaluate and analyze different algorithms for the classification of heart 
electrocardiogram signals. Aziz et al. [16], the authors use a new correlated moving average algorithm that has 
two TERMA special events and fractional Fourier transform algorithms for better analysis of the ECG signals. 
The TERMA algorithm identifies specific points of the signal that lie on a peak. While, the FrFT rotates the 
ECG signals in the position of the time-frequency plane in order to find the locations of the different peaks of 
the signal. Mathunjwa et al. [17], the main goal is to present a new approach of deep learning based on 2- 
second sections of the images associated to the peak diagram of the 2-dimensional ECG signals for the correct 
classification of arrhythmia. In this approach, in the first step, the size of the noise category and ventricular 
fibrillation were placed separately. Rahul and Sharma [18] suggest a method for classification of heart problem, 
e.g., Afib atrial fibrillation, Vfib ventricular fibrillation, Vtec ventricular tachycardia, and normal N rhythm, 
using a hybrid model based on 1-D convolutional neural network (CNN) and bidirectional long short-term 
memory (Bi-LSTM). Ramkumar et al. [19] use a deep convolutional neural network that is hierarchically based 
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on a tree in order to speed up classification and training. They use MLGK to decrease the risk of overfitting 
and to improve the visualization of the existing data. Kobat et al. [20], the authors use the index of PatNet54 
charter in order to create an extractor based on a graph. This extractor is known and named as a prism pattern. 
Mohonta et al. [21] develope a deep learning approach to automatically detect cardiac abnormalities based the 
ECG signal using the continuous wavelet transform (CWT) wavelet transform technique that performs the 
classification operation continuously. In the proposed model, training and testing are performed on a 2- 
dimensional convolutional neural network in order to detect five types of heartbeats. 


2.2. Aggregation in FL 

Chen and Chao [22], the proposed approach increases the communication and computation overhead 
by sending two vectors to the server in the last round of training, as there is a need for high bandwidth and a 
powerful processor to transfer this volume of data to the server. Shoham et al. [23] and Zenke et al. [24], the 
authors combine the inverse of the diameter of fisher's empirical information matrix and the laplace 
approximation to approximate the mean and covariance of the posterior probability. In addition, the researchers 
in [24] present a synaptic framework to address the continuous learning problem and use the Hessian matrix to 
approximate the loss functions [25]-[27]. 


3. PROPOSED METHOD 
3.1. Choosing the best optimizer for the server 

In the early stages for training clients, it is vital and necessary to achieve the best optimal point in the 
training process and less loss. Thus, our focus is on the use of a strong optimizer in order to reach the optimal 
point in training with imbalanced data. The volume of data samples in each client is acceptable for the device 
itself, and we will not encounter a drop in the model accuracy caused by imbalanced data. But at the stage of 
aggregating the weights from different clients, we will encounter a decreased accuracy in the final output of 
the server due to the conditions of imbalanced data and different weighted averages being received from the 
parameters. The stochastic gradient descent (SGD) is generally used as an optimizer in deep learning. This 
optimizer in the FL can find the optimized weights in order to achieve the goal of maximum learning when the 
training data is available in each client in a balanced proportion. But the problem in this study is the poor 
performance of the SGD for imbalanced data. Therefore, in our studies, we have tried to find an optimizer that 
fits the imbalanced data situation. In this process, we found the stochastic weight averaging (SWA) optimizer. 
The SWA is a random weighted average that follows a different (fixed or cyclic) learning rate schedule. In 
fact, this optimizer uses a solution previously obtained by the SGD as a pre-trained solution. The SWA takes 
a weighted average of all the trained models and starts its training with it. With a different learning rate, it finds 
a set of solutions that converge to the best optimal points, averages the solution it has achieved itself and the 
solution previously achieved by the SGD, and presents the final optimal point. In fact, Since, this optimizer 
finds a more stable solution than the SGD does in training with different data distributions, it outperforms for 
training and testing stages of the model. Its second advantage is the decreased gap between the accuracies 
received from the training and testing phases. In fact, using the SWA optimizer, we achieve a better authenticity 
of the accuracy and loss received from different stages of execution. 


3.2. Visual sampling in aggregation step according to SWA 

Figure 1 shows a view of the approach proposed by this study. Our goal in the aFL is to achieve a 
proper accuracy and ultimately maximum learning. Investigation of this issue is very necessary in certainty 
detection systems for areas such as medicine, driving, and space. Meanwhile, another goal is to accurately 
estimate the training status of each element participating in the FL survey. The global posterior approximation 
in the server is necessary to know its learning rate. Posterior probability estimation cannot be achieved by 
general mathematical methods and calculations. Of course, the most important reason for its intractability in 
the FL is the lack of access to data in clients. We use the multivariate Gaussian distribution for estimation of 
the posterior probability. To transfer the average weights to the server, we use the multivariate Gaussian 
distribution derived as in the following formula. In fact, we seek to solve the problem of imbalanced data by 
avoiding the transfer of mean and variance. In our initial tests, we found out that using the transferred mean 
and variance as the aggregate parameters in the server leads to a drop in the model efficiency. Therefore, by 
using the mentioned distribution and transferring the mean and covariance to the server, the problem of the 
maximum difference of the parameters is solved. 


p(x|D) = gg(x) = N(x|Ug. Xe) (1) 
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Using relation (1), we estimate the global posterior of the server in order to create a maximum learning platform 
in the server with a multivariate Gaussian distribution. But after the initial evaluation and testing, it was found 
that updating the covariance in all the stages leads to the maximum memory loss in the server. Thus, we 
exchanged parameters from the clients to the servers using the cross-covariance in the form of aSWAG formula 
as (2). 


‘i ; 
1? = ne Daiag = diag(n* — Twa ) = 


The sampling using the SWAG is evaluated as (3). 
aes ee _ =—Di ima 3 
u= nel ¢ ti ;Labiag = Diag (Xi 7 (™; — B) (3) 


Relation (3) is a general formulation for the SWAG. Samples can be transferred directly from the 
client to the server as the initial parameter in the current round for execution. In relation (3), C; denotes the 
data associated to the i-th client (Client i). The cross-covariance is updated using the optimum obtained from 
the SWA. Using the term (7; — 1)? we calculate the square of each element in order to achieve a proportional 
ratio of the limit between zero and one in the normalization process. In section 4, we will discuss in detail the 
results from applying the proposed approach. 
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Figure 1. Schematic of the proposed method and communication between the clients and the server 


4. EXPERIMENTAL RESULTS 
4.1. Preparation of results 
4.1.1. Settings and parameters of the proposed method 

In this study, the VGG19 model is used to evaluate the proposed method presented in section 3. The 
training is performed in 100 communication rounds and 64 batch normalization. 5 clients are employed for 
testing. The learning rate Lr = 0.001 is used in the initial rounds. Using the schedule presented in the SWA, the 
rate is adjusted during the rounds by increasing the communication rounds. It is found that when the learning 
rates are very large, convergence occurs in more communication rounds. And with a small learning rate, a 
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proper accuracy cannot be achieved in less communication rounds. This is why we proposed a learning rate 
according to the schedule. We increase the sampling in each training round by 10% of the total dataset in each 
round. 


4.1.2. Datasets 

We use the ECG heartbeat categorization dataset that is available on Kaggle. This dataset consists of 
two datasets related to the ECG heartbeat and two datasets of the heartbeat classification, the Massachusetts 
Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia data, and the Physikalisch-Technische 
Bundesanstalt (PTB) database. The appropriate size of the dataset is considered for proper the model training 
and avoiding overfitting. This dataset is used in the heartbeat classification using deep neural network 
architectures and transfer learning technique [28], [29]. The signals comply with the schematic ECG heartbeats 
for normal cases and those affected by various arrhythmias. These signals are segmented in the pre-processing 
stage and each segment corresponds to one heartbeat. Figure 2 presents a view of the selected dataset. 


# 9.779411554336.. = # 9.2647057771682... >= 


= # 2.450980395078... = 


0.70 - 0.75 


Count 6,118 0.15 - 0.20 


Count 7,277 


Figure 2. Schematic of the dataset of cardiac ECG signals 


4.1.3. Evaluation criteria 

In order to analyze the test results, we use 4 evaluation methods generally designed for evaluating the 
classification models, with the benchmark titles of SENecg (sensitivity), ACCecg (sccuracy), PRECecg, and 
SPEecg (specificity), respectively. 


TPcitTNel 


ACCee = TPcit+TN tFP ci t+FN cl » 
PRECocg = ae (5) 
SENecg = ae (6) 
SPEeeg = a (7) 


4.2. Evaluation of the proposed methtable 

Three clients were used to train the model in the execution process. The results have been depicted as 
a graph (Figure 3 and Figure 4) and the details have been presented in the form of Table 1. The results from 
the evaluation of the proposed method on the mentioned dataset indicate the maximum accuracy for the server. 
In the testing phase, the first client has the most accuracy, followed by the second client with an accuracy of 
nearly 80%. 


4.3. Comparison of the proposed method versus other methods 

In this section, the proposed method was compared with the previous approaches. To make the 
comparison, the accuracy assessment criterion has been used. In addition, comparisons were made from epochs 
between | and 100 and the accuracy value was reported at each point. As shown in Figure 5 the results of our 
proposed method indicated an accuracy of 87.98% in the best case, which is the highest accuracy compared to 
the previous FL diagnostic methods in the simplest case. 
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Figure 3. Results of evaluation in the training process with evaluation criteria for clients and server 


Figure 4. Results of evaluation of the proposed method in the testing phase 
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Table 1. Client and server results with evaluation criteria 


Type PREC SPE SEN ACC 
Server 98.64 96.39 93.19 98.87 
Client! 96.78 98.67 93.49 97.67 
Client2 96.15 93.73. 97.27 92.43 
Client3__ 97.79 97.32 99.94 97.05 
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Figure 5. Comparison of the proposed method versus previous ones 
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5. CONCLUSION 

In this study, we proposed an FL approach with a reasonable accuracy for detection of the ECG 
signals. To this end, we used the sampling technique in order to better aggregate the parameters to the server. 
In this approach, the server achieved a good accuracy compared to the proposed methods. The future 
motivations of this study include consolidating the platform of privacy preservation and providing new 
strategies for aggregation in less communication rounds. One future idea is to use joint learning for better 
detection. 
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